We show how the inherent, but often neglected, properties of large-scale LiDAR point clouds can be exploited for effective self-supervised representation learning. To this end, we design a highly data-efficient feature pre-training backbone that significantly reduces the amount of tedious 3D annotations to train state-of-the-art object detectors. In particular, we propose a Masked AutoEncoder (MAELi) that intuitively utilizes the sparsity of the LiDAR point clouds in both, the encoder and the decoder, during reconstruction. This results in more expressive and useful features, directly applicable to downstream perception tasks, such as 3D object detection for autonomous driving. In a novel reconstruction scheme, MAELi distinguishes between free and occluded space and leverages a new masking strategy which targets the LiDAR's inherent spherical projection. To demonstrate the potential of MAELi, we pre-train one of the most widespread 3D backbones, in an end-to-end fashion and show the merit of our fully unsupervised pre-trained features on several 3D object detection architectures. Given only a tiny fraction of labeled frames to fine-tune such detectors, we achieve significant performance improvements. For example, with only $\sim800$ labeled frames, MAELi features improve a SECOND model by +10.09APH/LEVEL 2 on Waymo Vehicles.
translated by 谷歌翻译
Existing Multiple Object Tracking (MOT) methods design complex architectures for better tracking performance. However, without a proper organization of input information, they still fail to perform tracking robustly and suffer from frequent identity switches. In this paper, we propose two novel methods together with a simple online Message Passing Network (MPN) to address these limitations. First, we explore different integration methods for the graph node and edge embeddings and put forward a new IoU (Intersection over Union) guided function, which improves long term tracking and handles identity switches. Second, we introduce a hierarchical sampling strategy to construct sparser graphs which allows to focus the training on more difficult samples. Experimental results demonstrate that a simple online MPN with these two contributions can perform better than many state-of-the-art methods. In addition, our association method generalizes well and can also improve the results of private detection based methods.
translated by 谷歌翻译
Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at \url{https://github.com/wlin-at/ViTTA}.
translated by 谷歌翻译
Keyless entry systems in cars are adopting neural networks for localizing its operators. Using test-time adversarial defences equip such systems with the ability to defend against adversarial attacks without prior training on adversarial samples. We propose a test-time adversarial example detector which detects the input adversarial example through quantifying the localized intermediate responses of a pre-trained neural network and confidence scores of an auxiliary softmax layer. Furthermore, in order to make the network robust, we extenuate the non-relevant features by non-iterative input sample clipping. Using our approach, mean performance over 15 levels of adversarial perturbations is increased by 55.33% for the fast gradient sign method (FGSM) and 6.3% for both the basic iterative method (BIM) and the projected gradient method (PGD).
translated by 谷歌翻译
尽管近年来行动认可取得了令人印象深刻的结果,但视频培训数据的收集和注释仍然很耗时和成本密集。因此,已经提出了图像到视频改编,以利用无标签的Web图像源来适应未标记的目标视频。这提出了两个主要挑战:(1)Web图像和视频帧之间的空间域移动; (2)图像和视频数据之间的模态差距。为了应对这些挑战,我们提出了自行车域的适应(CYCDA),这是一种基于周期的方法,用于通过在图像和视频中利用图像和视频中的联合空间信息来适应无监督的图像到视频域,另一方面,训练一个独立的时空模型,用于弥合模式差距。我们在每个周期中的两者之间的知识转移之间在空间和时空学习之间交替。我们在基准数据集上评估了图像到视频的方法,以及用于实现最新结果的混合源域的适应性,并证明了我们的循环适应性的好处。
translated by 谷歌翻译
域适应对于将学习模型调整到新方案,例如域移位或更改数据分布,这是至关重要的。目前的方法通常需要来自移位域的大量标记或未标记的数据。这可以是在需要连续动态适应或遭受数据稀缺的领域的障碍,例如,自动驾驶在挑战天气条件下。为了解决持续适应分配班的问题,我们提出了动态无监督的适应(DUA)。我们通过持续调整批量归一化层的统计来修改模型的特征表示。我们表明,通过从移位域中仅访问一小部分未标记的数据并按顺序调整,可以实现强大的性能增益。甚至从目标领域的未标记数据的少于1%,Dua已经实现了强大的基线的竞争结果。此外,与先前的方法相比,计算开销最小。我们的方法很简单,但有效,可以应用于任何使用批量归一化作为其组件之一的架构。我们通过在各种域适应数据集和任务中评估DUA的效用,包括对象识别,数字识别和对象检测。
translated by 谷歌翻译
尽管存在能够在许多医疗数据集上表现出很好的语义分割方法,但是通常,它们不设计用于直接用于临床实践。两个主要问题是通过不同的视觉外观的解开数据的概括,例如,使用不同的扫描仪获取的图像,以及计算时间和所需图形处理单元(GPU)存储器的效率。在这项工作中,我们使用基于SpatialConfiguration-Net(SCN)的多器官分段模型,该模型集成了标记器官中的空间配置的先验知识,以解决网络输出中的虚假响应。此外,我们修改了分割模型的体系结构,尽可能地减少其存储器占用空间,而不会急剧影响预测的质量。最后,我们实现了最小的推理脚本,我们优化了两者,执行时间和所需的GPU内存。
translated by 谷歌翻译
For conceptual design, engineers rely on conventional iterative (often manual) techniques. Emerging parametric models facilitate design space exploration based on quantifiable performance metrics, yet remain time-consuming and computationally expensive. Pure optimisation methods, however, ignore qualitative aspects (e.g. aesthetics or construction methods). This paper provides a performance-driven design exploration framework to augment the human designer through a Conditional Variational Autoencoder (CVAE), which serves as forward performance predictor for given design features as well as an inverse design feature predictor conditioned on a set of performance requests. The CVAE is trained on 18'000 synthetically generated instances of a pedestrian bridge in Switzerland. Sensitivity analysis is employed for explainability and informing designers about (i) relations of the model between features and/or performances and (ii) structural improvements under user-defined objectives. A case study proved our framework's potential to serve as a future co-pilot for conceptual design studies of pedestrian bridges and beyond.
translated by 谷歌翻译
机器学习(ML)模型的开发不仅仅是软件开发的特殊情况(SD):ML模型即使没有以看似无法控制的方式直接人类互动,也可以获取属性并满足要求。但是,可以形式上描述基础过程。我们为ML定义了一个全面的SD流程模型,该模型涵盖了文献中描述的大多数任务和文物。除了生产必要的工件外,我们还专注于以规格的形式生成和验证拟合描述。我们强调即使在初步训练和测试后,即使在生命周期中进一步发展ML模型的重要性。因此,我们提供了各种交互点,具有标准SD过程,其中ML通常是封装的任务。此外,我们的SD过程模型允许将ML作为(元)优化问题提出。如果严格自动化,则可以用来实现自适应自主系统。最后,我们的SD流程模型具有时间的描述,可以推理ML开发过程中的进度。这可能会导致ML领域内形式方法的进一步应用。
translated by 谷歌翻译
我们解决了联合学习(FL-HPO)的超参数优化(HPO)的相对未开发的问题。我们引入联邦损失表面聚合(Flora),该框架的第一个FL-HPO解决方案框架可以解决除了在流体文献中通常寻址的随机梯度下降/神经网络之外的表格数据和梯度提升训练算法的用例。该框架使单次FL-HPO能够首先识别**单次**培训中使用的良好的超参数集。因此,与没有HPO的FL训练相比,它使FL-HPO解决方案具有最小的额外通信开销。我们对七个OpenML数据集的梯度提升决策树Flora的实证评估表明,对所考虑的基线,以及越来越多的涉及FL-HPO培训的各方的鲁棒性,可以显着的模型准确性。
translated by 谷歌翻译